Background. The innovation process in hematology requires access to a large amount of healthcare data. However, 97% of patient data produced by hospitals remains unused (Source: Deloitte, Health Data, 2023), primarily due to privacy limitations, lack of data harmonization from different sources, and the unstructured and dispersed nature of the information.
Large Language Models (LLM) are computational models capable of performing general-purpose language generation and other natural language processing tasks. These models acquire these abilities by learning statistical relationships from vast amounts of text through a computationally intensive self-supervised and semi-supervised training process. LLMs have been increasingly utilized in healthcare to enhance diagnostics, streamline patient interactions, and improve overall clinical workflows. In this project, we analyze the potential of Artificial Intelligence (AI) solutions based on LLM for data retrieval, extraction and generation to create standardized datasets to accelerate clinical and translational research in blood diseases in hematology.
Aims. The “David vs Goliath” study was conducted by Synthema EU consortium with the following aims to: 1) develop AI solution leveraging LLM for information retrieval, extraction and generation of research-ready datasets from multiple medical sources; 2) evaluate clinical and statistical fidelity of AI-retrieved dataset through a specific Validation Framework (VF); 3) validate the reliability of AI-retrieved dataset to build personalized prognostic models.
Methods. We proposed ARISTOTELES an Automatic Retrieval Information System TO acceleraTE clinical and translationaL research in hEmatological malignancieS. This solution has three main components: a Retrieval-Augmented Generation (RAG) system for information retrieval; an LLM for data extraction and a Generative Pretrained Transformer (GPT) for missing data inference. The RAG component, also leveraging a dedicated LLM model, was implemented to search information across multiple data sources from the Humanitas Research Hospital DataLake (a fully privacy compliant environment) and to enhance the quality of retrieved information. The information provided by RAG was then extracted by a second hematological-tuned LLM model into a structured dataset in common data model format. Finally, the GPT model, trained on hematological data, was then used to generate complete data, conditioned on partially extracted patients' information.
Results. The original dataset (Goliath) comprises 1167 patients with myeloid neoplasms (MN) from Humanitas Research Hospital, including multiple layers of information with comprehensive demographic, clinical and genomic data (cytogenetics and mutational screening) alongside treatment and outcome. The AI-retrieved dataset (David) was generated by applying ARISTOTELES on medical records from the same MN patients.
The comparison of the two datasets was performed by a specific validation framework (based on PMID:38875514). Distributions and correlations for clinical, demographic, genomic and cytogenetic in both datasets were comparable with 91% of fidelity. Mutation distribution and pairwise association among genes and/or cytogenetics abnormalities resulted in 90.1% of fidelity.
No significant statistical difference between the two datasets has been observed by comparing the survival curves with a Kaplan-Meier model with a log-rank test in patients stratified according to clinical labels. Finally, we performed Cox proportional hazards analyses (Cox-PH) including clinical and genomic information from the David vs Goliath datasets to compare their performance (concordance index, CI). Considering overall survival as a clinical endpoint the CI of Cox-PH models was 0.75 and 0.74 respectively.
Conclusion. ARISTOTELES solution allows automatic information retrieval, extraction and structuring of complex multimodal healthcare data. AI-retrieved data (David) resulted in high clinical and statistical fidelity with respect to the original dataset (Goliath). Overall, this results in increasing access to healthcare data and reducing human effort required for data collection tasks, thereby accelerating clinical research in hematology.
Santoro:Celgene: Speakers Bureau; Amgen: Speakers Bureau; Abb-vie: Speakers Bureau; Roche: Speakers Bureau; Takeda: Speakers Bureau; Astrazeneca: Speakers Bureau; Arqule: Speakers Bureau; Lilly: Speakers Bureau; Sandoz: Speakers Bureau; Novartis: Speakers Bureau; Beigene: Speakers Bureau; MSD: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Bayer: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; EISAI: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Pfizer: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Gilead: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Servier: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; BMS: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Incyte: Consultancy; Sanofi: Consultancy. Santini:BMS/Celgene: Membership on an entity's Board of Directors or advisory committees; AbbVie: Membership on an entity's Board of Directors or advisory committees; CTI: Membership on an entity's Board of Directors or advisory committees; Geron: Membership on an entity's Board of Directors or advisory committees; Keros: Membership on an entity's Board of Directors or advisory committees; Jazz: Membership on an entity's Board of Directors or advisory committees; Novartis: Membership on an entity's Board of Directors or advisory committees; Servier: Membership on an entity's Board of Directors or advisory committees; Syros: Membership on an entity's Board of Directors or advisory committees. Platzbecker:Curis: Consultancy, Honoraria, Research Funding; Geron: Consultancy; Amgen: Consultancy, Research Funding; Abbvie: Consultancy, Research Funding; Janssen: Consultancy, Honoraria, Research Funding; Merck: Research Funding; MDS Foundation: Membership on an entity's Board of Directors or advisory committees; BMS: Consultancy, Membership on an entity's Board of Directors or advisory committees, Other: Travel support, Research Funding; Novartis: Consultancy, Research Funding. Fenaux:Astex: Research Funding; Servier: Research Funding; Agios: Research Funding; Novartis: Research Funding; Jazz Pharmaceuticals: Honoraria, Research Funding; Janssen: Research Funding; AbbVie: Honoraria, Research Funding; BMS: Honoraria, Research Funding. Diez-Campelo:ASTEX/OTSUKA: Membership on an entity's Board of Directors or advisory committees, Other: TRAVEL TO MEETINGS; CURIS: Membership on an entity's Board of Directors or advisory committees; SYROS: Membership on an entity's Board of Directors or advisory committees; HEMAVAN: Membership on an entity's Board of Directors or advisory committees; AGIOS: Consultancy, Membership on an entity's Board of Directors or advisory committees; BLUEPRINT MEDICINES: Consultancy, Membership on an entity's Board of Directors or advisory committees; KEROS: Honoraria, Membership on an entity's Board of Directors or advisory committees; Novartis: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees; GSK: Consultancy, Membership on an entity's Board of Directors or advisory committees; Gilead: Other: Travel reimbursement; BMS/Celgene: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Advisory board fees. Komrokji:Celgene/BMS: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Servier: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Servio: Membership on an entity's Board of Directors or advisory committees; Servio: Honoraria; Genentech: Consultancy; Keros: Membership on an entity's Board of Directors or advisory committees; BMS: Research Funding; Novartis: Membership on an entity's Board of Directors or advisory committees; Geron: Consultancy, Membership on an entity's Board of Directors or advisory committees; Janssen: Consultancy; AbbVie: Consultancy, Membership on an entity's Board of Directors or advisory committees; DSI: Consultancy, Membership on an entity's Board of Directors or advisory committees; Jazz Pharmaceuticals: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Rigel: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Sobi: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Sumitomo Pharma: Consultancy, Membership on an entity's Board of Directors or advisory committees; Taiho: Membership on an entity's Board of Directors or advisory committees; CTI biopharma: Membership on an entity's Board of Directors or advisory committees; DSI: Honoraria, Membership on an entity's Board of Directors or advisory committees; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees; PharmaEssentia: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau. Garcia-Manero:Onconova: Research Funding; H3 Biomedicine: Research Funding; Astex: Other: Personal fees; Bristol Myers Squibb: Other: Personal fees, Research Funding; Genentech: Research Funding; AbbVie: Research Funding; Novartis: Research Funding; Helsinn: Research Funding; Forty Seven: Research Funding; Aprea: Research Funding; Janssen: Research Funding; Curis: Research Funding; Merck: Research Funding; Helsinn: Other: Personal fees; Astex: Research Funding; Amphivena: Research Funding; Genentech: Other: Personal fees. Kordasti:Pfizer: Consultancy, Speakers Bureau; Beckman Coulter: Speakers Bureau; MorphoSys: Research Funding; Alexion: Consultancy; API: Consultancy; Boston Biomed: Consultancy; Celgene: Research Funding; Novartis: Consultancy, Honoraria, Research Funding, Speakers Bureau. Della Porta:Bristol Myers Squibb: Consultancy.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal